
Calhoun 

iniQiuiic^iul Ar{hiv« of tilt Mil vdl Poii^roduiit School 


Calhoun: The NPS Institutional Archive 
□Space Repository 



Theses and Dissertations 


1. Thesis and Dissertation Collection, all items 


2006-12 

Application of neural networks to predict 
UH-60L electrical generator condition using 
(IMD-HUMS) data 

Tourvalis, Evangelos. 

Monterey, California. Naval Postgraduate School 


http://hdl.handle.net/10945/2413 


Downloaded from NPS Archive: Calhoun 



DUDLEY 

KNOX 

LIBRARY 


htt p://w ww. n ps. e du/l ib ra ry 


Caflwuo is the Naval Postgraduate School's public access digital repository for 
research mate rials and institutiional putilicatians created by the NPS community. 
Calhoun is named for Professor of Mathematics Guy K. Caftiouo, NPS's first 
appointed — and putJlished — schoteily author. 

Dudley Knox Library / Naval Postgraduate School 
411 Dyer Road / 1 Univefsity Circle 
Monterey, California USA 93943 







NAVAL 

POSTGRADUATE 

SCHOOL 

MONTEREY, CALIEORNIA 


THESIS 


APPLICATION OF NEURAL NETWORKS TO PREDICT 
UH-60L ELECTRICAL GENERATOR CONDITION USING 
(IMD-HUMS) DATA 

by 

Evangelos Tourvalis 
December 2006 

Thesis Advisor: Lyn R. Whitaker 

Second Reader: Samuel E. Buttrey 


Approved for public release; distribution is unlimited 




THIS PAGE INTENTIONALLY LEET BLANK 



REPORT DOCUMENTATION PAGE 


FormApprovedOMBNo^ 0704-018^^ 
Public reporting burden for this collection of information is estimated to average 1 hour per response, including 
the time for reviewing instruction, searching existing data sources, gathering and maintaining the data needed, and 
completing and reviewing the collection of information. Send comments regarding this burden estimate or any 
other aspect of this collection of information, including suggestions for reducing this burden, to Washington 
headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 
1204, Arlington, VA 22202-4302, and to the Office of Management and Budget, Paperwork Reduction Project 
(0704-0188) Washington DC 20503. 

I. AGENCY USE ONLY (Leave blank) 2. REPORT DATE 3. REPORT TYPE AND DATES COVERED 

December 2006 Master’s Thesis 

4. TITLE AND SUBTITLE: Application of Neural Networkss to Predict UH-60L 5. PENDING NUMBERS 
Electrical Generator Condition using (IMP-HUMS) data _ 

6. AUTHOR(S) Tourvalis Evangelos 

7. PERPORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. PERPORMING 

Naval Postgraduate School ORGANIZATION REPORT 

Monterey, CA 93943-5000 _ NUMBER _ 

9. SPONSORING /MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSORING/MONITORING 
N/A AGENCY REPORT NUMBER 

II. SUPPLEMENTARY NOTES The views expressed in this thesis are those of the author and do not reflect the official 

policy or position of the Department of Defense or the U.S. Government. _ 

12a. DISTRIBUTION / AVAll ABllTTY STATEMENT 12b. DISTRIBUTION CODE 

Approved for public release; distribution is unlimited 

13. ABSTRACT (maximum 200 words) 

In 2003, the US Army began using the Integrated Mechanical Diagnostics Health and Usage Management System 
(IMD-HUMS), an integrated airborne and ground-based system developed by Goodrich Corporation, to support maintenance of 
the UH-60L. IMD-HUMS is responsible for collecting, processing, analyzing, and storing an enormous amount of vibratory 
and flight regime data obtained from sensors located throughout the aircraft. 

The purpose of this research is to predict failures of the UH-60L’s electrical generators, applying Artificial Neural 
Networks (ANN) on the IMD-HUMS-produced data. Artificial NNs are data based vice rule based, thereby possessing the 
potential capability to operate where analytical solutions are inadequate. They are reputed to be robust and highly tolerant of 
noisy data. Software tools such as Clementine 10.0, S-Plus 7.0, and Excel are used to establish these predictions. 

This research has verified that ANNs have a position in machinery condition monitoring and diagnostics. However, 
the limited nature of these results indicates that ANNs will not solve all machinery condition monitoring and diagnostics 
problems by themselves. They certainly will not completely replace conventional rule-based expert systems. Ultimately, it is 
anticipated that a symbiotic combination of these two technologies will provide the optimal solution to the machinery condition 
monitoring and diagnostics problem. _ 

14. SUBJECT TERMS Condition Based Maintenance, IMD-HUMS, ANNs, Backpropagation, 15. NUMBER OP 

Learning Process PAGES 

_99_ 

16. PRICE CODE 

17. SECURITY 18. SECURITY 19. SECURITY 20. LIMITATION 

CLASSIPICATION OP CLASSIPICATION OP THIS CLASSIPICATION OP OP ABSTRACT 

REPORT PAGE ABSTRACT 

_ Unclassified _ Unclassified _ Unclassified _UL_ 

NSN 7540-01-280-5500 Standard Form 298 (Rev. 2-89) 

Prescribed by ANSI Std. 239-18 


1 




























THIS PAGE INTENTIONALLY LEET BLANK 


11 



Approved for public release; distribution is unlimited bold 


APPLICATION OF NEURAL NETWORKS TO PREDICT UH-60L 
ELECTRICAL GENERATOR CONDITION USING (IMD-HUMS) DATA 

Evangelos Tourvalis 
Major, Hellenic Air Force 

Submitted in partial fulfillment of the 
requirements for the degree of 


MASTER OF SCIENCE IN OPERATIONS RESEARCH 


from the 


NAVAL POSTGRADUATE SCHOOL 
December 2006 


Author: Evangelos Tourvalis 


Approved by: Eyn R. Whitaker 

Thesis Advisor 


Samuel E. Buttrey 
Second Reader 


James N. Eagle 

Chairman, Department of Operations Research 



THIS PAGE INTENTIONALLY LEET BLANK 


IV 



ABSTRACT 


In 2003, the US Army began using the Integrated Mechanical Diagnostics Health 
and Usage Management System (IMD-HUMS), an integrated airborne and ground-based 
system developed by Goodrich Corporation, to support maintenance of the UH-60L. 
IMD-HUMS is responsible for collecting, processing, analyzing, and storing an 
enormous amount of vibratory and flight regime data obtained from sensors located 
throughout the aircraft. 

The purpose of this research is to predict failures of the UH-60L’s electrical 
generators, applying Artificial Neural Networks (ANN) on the IMD-HUMS-produced 
data. Artificial NNs are data based vice rule based, thereby possessing the potential 
capability to operate where analytical solutions are inadequate. They are reputed to be 
robust and highly tolerant of noisy data. Software tools such as Clementine 10.0, S-Plus 
7.0, and Excel are used to establish these predictions. 

This research has verified that ANNs have a position in machinery condition 
monitoring and diagnostics. However, the limited nature of these results indicates that 
ANNs will not solve all machinery condition monitoring and diagnostics problems by 
themselves. They certainly will not completely replace conventional rule-based expert 
systems. Ultimately, it is anticipated that a symbiotic combination of these two 
technologies will provide the optimal solution to the machinery condition monitoring and 
diagnostics problem. 
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EXECUTIVE SUMMARY 


Readiness is a key factor for military forces to stay effective and reliable in a 
continuously growing and demanding environment. Increased readiness can be achieved 
by increasing availability through performing efficient maintenance, performing less 
corrective maintenance actions, and identifying more accurate preventive maintenance 
periods. Today, the United States and allied forces spend billions of dollars for time or 
phased maintenance periods that overlook several facts and realities of operational use. 
Important savings can be gained by using hardware and software to evaluate component 
health and the conditions of systems based on operational usage and performing 
maintenance in relation to statistical and engineering analyses that predict availability and 
readiness. 

Nowadays, the majority of maintenance processes are accomplished by either the 
predetermined preventive or the corrective approach. The former approach has fixed 
maintenance intervals; the latter is performed after the fault of the component. Because 
both approaches are costly, some industries have started to perform maintenance action in 
a predictive manner. Condition Based Maintenance (CBM), where the condition is the 
key parameter to set the maintenance intervals and appropriate maintenance tasks. 

Condition Based Maintenance (CBM) is a technology weapon that tries hard to 
recognize initial faults before they develop into critical failures, which permits more 
precise scheduling of the preventive maintenance. The causes that have motivated a boost 
in the action of CBM include the need for reduced maintenance and logistics costs, 
protection against failure of mission-important equipment, and upgraded equipment 
availability. 

In 2003, the US Army began using the Integrated Mechanical Diagnostics Health 
and Usage Management System (IMD-HUMS), an integrated airborne and ground-based 
system developed by Goodrich Corporation, to support maintenance of the UH-60L. 
IMD-HUMS is responsible for collecting, processing, analyzing, and storing an 
enormous amount of data obtained from sensors located throughout the aircraft. The 
IMD-HUMS improves aircraft availability for operators by identifying potential 
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problems early so that maintenanee ean be performed before it beeomes an issue that 
eould impaet flight operations. The system also provides operators with aeeurate flight 
parameter data, monitored automatieally on eaeh flight, allowing them to better sehedule 
routine maintenanee and, in some eases, avoid unneeessary early repair and overhaul. 

Neural networks are used in numerous fields, ineluding medieal diagnosties. In 
this thesis neural networks are used for maehinery diagnosties and speeifieally for 
diagnosing the UH-60L helieopter’s eleetrieal generator. In order to aeeomplish this, a 
database eolleeted from IMD-HUMS is used. The emphasis in this thesis is to develop a 
neural network that would utilize the eolleeted data from IMD-HUMS, manufaetured by 
Goodrieh Corporation, in order to diseover patterns that would prediet a potential failure 
of a UH-60L helieopter generator. Many different neural networks are evaluated for their 
sueeess rate for this faulting diagnosis. 

As in any predietion/foreeasting model, the seleetion of appropriate model inputs 
is extremely important. However, in most ANN Artifiaeial Neural Network) applieations, 
less attention is given to this task. The main reason for this is that ANNs belong to the 
class of data-driven approaches, whereas conventional statistical methods are model 
driven. In the latter, the structure of the model has to be determined first, which is done 
with the aid of empirical or analytical approaches, before the unknown model parameters 
can be estimated. Data-driven approaches, on the other hand, have the ability to 
determine which model inputs are critical, so there is less need for “...a priori 
rationalization about relationships between variables...” However, presenting a large 
number of inputs to ANN models and relying on the network to determine the critical 
model inputs usually increases network size. This has a number of disadvantages, such as 
decreasing processing speed, increasing the amount of data required to estimate the 
connection weights efficiently and degrading performance of the AAN. This is 
particularly true for complex problems, where the number of potential inputs is large and 
where no a priori knowledge is available to suggest which inputs to include. 

Clementine which is the software used in this research, incorporates several 
features to avoid some of the common pitfalls of ANNs, including sensitivity analysis, 
network accuracy, and feedback graph. With these options selected, a sensitivity analysis 



will provide information on which input fields are most important in predicting the output 
field, a network accuracy will provide the percentage of records for which the prediction 
of the model matches the observed value in the data, and the feedback graph will depict 
the accuracy of the network over time as it learns. 

In practice, building an ANN forecasting model involves a lot of trial and error. 
Consequently, the objective of this thesis is to provide a practical, non-technical 
introduction to structure an ANN forecasting model using real operating data of UH-60L 
helicopters. The success of ANN applications for an individual researcher depends on 
three key factors. First, the researcher must have the time, patience, and resources to 
experiment. Second, the ANN software must allow automated routines, such as walk- 
forward testing, optimization of hidden neurons, and testing of input variable 
combinations—either through direct programming or the use of batch/script files. Third, 
the researcher must maintain a good set of records that lists all parameters for each 
network tested. 

This research has verified that ANNs have a position in machinery condition 
monitoring and diagnostics. However, the limited nature of these results indicates that 
ANNs will not solve all machinery condition monitoring and diagnostics problems by 
themselves. They certainly will not completely replace conventional rule-based expert 
systems. Ultimately, it is anticipated that a symbiotic combination of these two 
technologies will provide the optimal solution to the machinery condition monitoring and 
diagnostics problem. 
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I. INTRODUCTION 


Readiness is a key factor in enabling military forces to stay effective and reliable 
in a continuously growing and demanding environment. Increased readiness can be 
achieved by increasing availability through performing efficient maintenance, performing 
fewer corrective maintenance actions, and identifying more accurate preventive 
maintenance periods. Today, the United States and allied forces spend billions of dollars 
on time or phased-maintenance approaches that overlook several facts and realities of 
operational use. Important savings can be gained by using hardware and software to 
evaluate component health and the conditions of systems based on operational usage and 
performing maintenance in relation to statistical and engineering analyses that predict 
availability and readiness. 

The emphasis in this thesis is to develop a neural network that utilizes data 
collected from IMD-HUMS, manufactured by Goodrich Corporation, in order to discover 
patterns that can predict a failure of a UH-60L helicopter generator. Many different 
neural networks will be evaluated for their success rate on this faulting diagnosis. 

A, CONDITION BASED MAINTENANCE 

Maintenance is usually carried out in either time-based scheduled periods (so- 
called preventive maintenance) or by corrective maintenance. Preventive maintenance 
aims to avoid system or component failure by performing repair, service, or replacement 
within the fixed time intervals. On the other hand, corrective maintenance is performed 
after the failure or when an apparent fault has taken place (Davis, A., 1998). For several 
types of equipment or systems the maintenance action must be done without delay, but 
for many others it can be delayed depending on the equipment’s function. In many cases 
the preventive maintenance can be divided into two groups: Condition-Based 
Maintenance (CBM) and Predetermined Maintenance (PM). PM is scheduled in time, 
while CBM mostly has dynamic or on-request intervals (Figure 1). 
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Figure 1. Overview of Maintenance Terminology 

Entire CBM schemes involve a number of efficient capabilities, like sensing and 
data acquisition, signal processing, condition and health estimation, prognostics, and 
decision assistance. Moreover, in order for the user to have access to the system, a 
Human System Interface (HSI) development is necessary. Generally, the integration of 
various hardware and software components is needed to implement a CBM system. 

A complete architecture for CBM systems should cover the range of functions 
from data collection through the recommendation of specific maintenance actions. The 
major tasks that assist CBM consist of (http://www.osacbm.org) : 

• Sensing and data acquisition 

• Signal processing and feature extraction 

• Production of alarms or alerts 

• Fault or failure diagnosis and health evaluation 

• Prognostics: projection of health profiles to future health or estimation of 
remaining useful life 

• Decision aiding: maintenance recommendations, or evaluation of asset 
readiness for a particular operational setting 

• Management and control of data flows or test sequences 
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• Management of historical data storage and historical data access 

• System configuration management 

• Human system interface. 

CBM makes use of information collected on equipment through monitoring 
devices. As equipment becomes more complex, more manufacturers are providing these 
monitoring devices to assist companies or organizations handle and maintain their 
equipment (Tsang, A., 1995). CBM uses this online data to compare equipment 
conditions to predefined operating thresholds. Data that happen to fall outside these 
thresholds generates a maintenance alert by the software that signals a problem or area of 
concern. 

B, IMD-HUMS 

In 2003, the US Army began using the Integrated Mechanical Diagnostics Health 
and Usage Management System (IMD-HUMS), an integrated airborne and ground-based 
system developed by Goodrich Corporation, to support maintenance of the UH-60L. 
IMD-HUMS is responsible for collecting, processing, analyzing, and storing an 
enormous amount of data obtained from sensors located throughout the aircraft. The 
IMD-HUMS improves aircraft availability for operators by identifying potential 
problems early so that maintenance can be performed before it becomes an issue that 
could impact flight operations. The system also provides operators with accurate flight 
parameter data, monitored automatically on each flight, allowing them to better schedule 
routine maintenance and, in some cases, avoid unnecessary early repair and overhaul. 
The IMD-HUMS consists of two main subsystems: the On-Board System (OBS) and the 
Ground Station System (GSS) (System Users Manual For IMD-HUMS, 1995). 

1. On-Board System (OBS) 

The OBS is comprised of the following components (Figure 2): 

• Cockpit display unit (CDU) 

• Data transfer unit (DTU) 

• Remote data concentrator (RDC) 

• Main processor unit (MPU) 

• 2 j unction boxe s (JB1 /JB 2) 

• 20 drive train and gearbox accelerometers 
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• 4 engine accelerometers 

• 5 trim and balance accelerometers 

• 1 4g body accelerometer for regime recognition 

• Main and tail rotor magnetic RPM sensors 

• Main rotor blade tracker 

• Engine output shaft optical tachometers. 

The heart of the IMD-HUMS OBS is the Main Processing Unit (MPU). The MPU 
collects the data from the accelerometers, analyzes the inputs, and records the data, 
seeking for vibration exceedances and events. It calculates time spent in various flight 
regimes, performs various diagnostic algorithms, and stores the data to an onboard data 
cartridge. The OBS also provides for crew interaction through a Cockpit Display Unit 
(CDU) in order to support prompted procedure actions related to power assurance checks, 
power train analyses, and rotor track and balance data acquisitions. Besides prompted 
actions, the OBS uses regimes information to automatically store power train and rotor 
vibration data. 

2. Ground Station System (GSS) 

The GSS is the major user interface for the IMD-HUMS. It performs after-flight 
debrief and is designed to analyze, process, and compile flight data into useful 
information for the maintenance crew, logistics teams, the operations department, and 
engineering support. The IMD-HUMS GSS functions include: 
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Figure 2. OBS & GSS (From: IMD-HUMS User Manual, 2005) 
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• Rotor Time and Balance 

• Strip Charts of Aircraft Data 

• Engine Performance 

• Trending 

• Usage Computation and Tracking 

• Regime Identification and Processing 

• Flight Operations Management 

• Fault/BIT Display 

• Maintenance Management 
C. PREVIOUS WORK 

Willard and Klesch in their 2005 thesis, used 36,742 observations from monitored 
components of 30 UH-60F helicopter’s generators. The data was collected during the 
two-year period where the IMD-HUMS were installed. Each IMD-HUMS acquisition 
concerning the shaft, spur gear, and bearing of generators results in 170 variables. Each 
generator is assigned a binary value 1 or 0 to classify its known state. The value of one 
was given to the generators that were removed for fault, hence referred as bad generators. 
The value of zero was given to the generators that were not removed, referred to as good 
generators. To accomplish this generator classification, maintenance records and 
photographs from the 10U‘ AVN Division were used. 

Principal components and other techniques were applied to reduce the 170 initial 
predictors to only 10. A logistic regression model and random forest classifiers were used 
on each generator, and the plotted probabilities of being bad were smoothed and used to 
predict the current functional condition of generators in the test set. Only Condition 
Indicators (Cl) computed in the last 20 observations of each generator were used in the 
predictive models because generators classified as bad were not necessary bad through 
their entire two-year history. Due to the highly variable nature of the predictor values, the 
model had lower success predicting states with just one acquisition. One the other hand, 
some surprising cases of generators which were wrongly presumed to be bad and, 
conversely, another generator which was wrongly assumed to be good, were classified 
correctly by this study’s approach. 
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D, AREA OF RESEARCH AND APPROACH 

ANNs have a number of traits that make them an attractive alternative to 
conventionally configured expert systems. First, many are capable of discriminating non¬ 
linear relationships. Second, they are capable of functioning with a certain degree of 
background noise and erroneous information with minimal degradation of their pattern 
recognition abilities. Third, they have the ability to generalize, having the ability to 
classify previously unseen vector patterns into existing and, in some cases, new output 
categories. They are also capable of identifying multiple faults. These are all areas where 
traditional expert systems typically fall short. Moreover, ANNs are data-based rather than 
rule-based. This means that they may be capable of correctly discriminating relationships 
previously hidden from the best of “experts”. 

ANNs are not without their disadvantages. They, like all computer algorithms, are 
capable only of manipulating numbers and require an engineer to discern the intelligence 
of their output. Their success is largely limited to the quality of the data that they are 
provided. If the input vectors provided are inadequate to describe the decision space 
fully, then their likelihood for success is small. Again, they require an engineer to provide 
the proper inputs. Finally, they may be able to distinguish new relationships, but the 
relationships themselves remain hidden; all that is seen external to the network are the 
input and the output vectors. It is generally believed that the relationships are somehow 
hidden in the connection weights and the hidden layers but meaningful extraction of this 
information has yet to occur. 

ANNs appeared to have potential in numerous fields, including machinery 
diagnostics. The question might be asked whether an ANN should theoretically be 
capable of recognizing patterns in vibration signatures. It is the scope of this research to 
determine whether this potential can be realized in the region of machinery diagnostics 
and specifically for the UH-60L helicopter’s electrical generator. In order to accomplish 
this, a database collected from IMD-HUMS is be used. Pattern recognition is an essential 
component of rotating machinery condition forecasting; therefore, examining and 
training different model structures and shapes in trying to identify patterns that are 
“storing” the “weights” of the networks is being researched. 
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E, STATISTICAL TOOLS 

ANN software prices start at a few hundred dollars and can go to hundreds of 
thousands, or even more. The most expensive ones are generally packaged with more 
complete data mining products, which contain ANNs as one of the capabilities offered. 
This research utilizes one of those statistical packages, Clementine 10.0 produced by 
SPSS. This product is designed to function on servers and networks and has the ability to 
handle massive databases. The software also provides some handy features useful in 
evaluating the function fit by the NN, such as a computation of sensitivities. Through this 
software, the researcher had the opportunity to apply and, at the same time, train a 
number of different kinds of ANNs, in an effort to find the most suitable for the database 
of interest. 

In addition, S-PLUS 7.0 from Insightful and Excel from Microsoft, two 
inexpensive and well-spread tools were used to assist the models in this research. 

F. ORGANIZATION OF STUDY 

This thesis begins with Chapter I, which briefly introduces the reader to the 
modem concept of CBM and the tools available to support it, like IMD-HUMS 
manufactured by Goodrich Corporation. A summary of previous relative work is 
provided, along with the author’s area of interest and the tools utilized to achieve the 
objective of this research. 

Chapter II is dedicated to the traditional explanation of ANNs. Theory and 
pictures are used concurrently trying to clarify what is commonly known to ANNs as a 
“black box” solution. Data enter the “black box” and a prediction comes out of it. 

Chapter III describes the database, procedures that are followed to clean and 
choose the final data in use, and presents the steps of the methodology that leads to the 
result of this research. 

Chapter IV briefly presents and discusses the results and outputs of the models 
that were trained in supporting the goal of this research. Most of the structures of these 
models are summarized in Appendix B. 

Finally, Chapter V summarizes the conclusions and recommendations for future 
research ideas associated with the database and used techniques. 
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II. ARTIFICIAL NEURAL NETWORKS OVERVIEW 


An Artificial Neural Network (ANN) is an information processing system that has 
certain performance characteristics in common with biological ANNs. They are parallel 
in nature and the fundamental idea behind them is that, if it works in nature, it must be 
able to work in computers. ANNs are data-based vice rule-based, so they possess the 
potential of being able to operate where analytical solutions are inadequate. They are 
reputed to be robust and extremely tolerant of noisy data. 

A, HISTORY 

The ANN concept was first introduced in 1943 by W. McCullock and W. Pitts, 
who, while trying to describe how the brain’s neurons might work, modeled an ANN 
using electrical circuits. In 1949, D. Hebb introduced the training of the ANNs in his 
book “The Organization of Behavior,” in which he argued that if two nerves fire at the 
same time, their connection is strengthened and thereby it is also possible that the same 
two nerves will fire again. By the 1950’s, while computers became more advanced, 
researchers were eventually able to simulate such a hypothetical ANN. N. Rochester of 
IBM laboratories was an early pioneer in this field but, unfortunately, his effort failed. 
During 1959 and 1962, B. Widrow and M. Hoff developed two models (ADALINE and 
MAD ALINE) that recognized binary patterns, introduced a new learning algorithm 
applying the Least-Mean-Squares (LMS) learning rule, and used it to train adaptive 
ANNs. 

Eor almost for two decades (1960-1980), interest in ANNs faded because of a lack 
of new ideas and computational power. In 1972, T. Kohonen and J. Anderson developed 
a similar network independently of one another, with both resulting in a collection of 
analog AD ALINE circuits. The first multi-layered network was developed in 1975, an 
unsupervised network. 

Since 1980, many researchers have boosted the idea and today ANNs are 
extremely popular as prediction and forecasting tools in a number of areas. The future of 
ANNs, though, lies in the development of hardware. 
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B, BIOLOGICAL NEURON 

In principal, the brain is composed of almost 10 billion neurons; each is attached 
to about 10,000 other neurons. The main segment of the cell is called the soma or cell 
body (Figure 3). Neurons have a large number of extensions called dendrites. Each 
neuron receives electrochemical signals from other neurons connected throughout 
different axons. At the synapses—between the dendrite and axons—electrical signals are 
modulated in various amounts. If the sum of these electrical signals is sufficiently 
influential to activate the neuron, it produced an electrochemical output along the axon, 
and passes this signal to the other neurons, whose dendrites are attached at any of the 
axon terminals (Norgaardm M., Ravn O., Poulsen N., Hansen L., 2000). These attached 
neurons may then fire. It is essential to point out that a neuron fires only if the entire 
signal received at the cell body surpasses a certain level. The neuron either fires or it 
doesn't; there aren't different levels of firing. 



Figure 3. A Biological Neuron (From: Fawrence, J., 1993) 

The human brain is composed of these interrelated, electrochemical, broadcasting 
neurons. From a huge number of extremely naive processing units (each carrying out a 
weighted sum of its inputs, and then firing a binary output if the total input exceeds a 
certain level) the brain controls very complex tasks. This is the model on which artificial 
ANNs are based. Although ANNs haven't even approximated modeling the 
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complexity of the human brain, they have appeared to be excellent at problems that are 
easy for a human but difficult for a conventional computer, such as image recognition 
and forecast based on earlier knowledge. 

C. ARTIFICIAL NEURON 

The artificial neuron is meant to mimic the major characteristics of the biological 
neuron. The three crucial mechanisms of the artificial neuron are: 

(1) The synapses or connecting links that give weights, w. to the input values, 
Xj for7=1, 2,..., m (Figure 4). 

(2) A summer that adds all the weighted input values to calculate the input to 

m 

the activation function v = + ’ where Wq called the bias (not to 

j=i 

be confused with statistical bias in prediction or estimation) which is an 
arithmetic value associated with the neuron. It is suitable to consider the 
bias as the weight for an input Xq whose value is constantly equivalent to 

m 

1, so that V = '^WjXj. 

i=o 

(3) An activation function g that maps v to g (v), the output value of the 
neuron. This function is a monotone function. 



Figure 4. An Artificial Neuron 

The artificial neuron has two stages of operation; the training stage and the using 
stage (Fausett, L., 1994). During the former, the neuron can be trained to fire or not, for 
specific input patterns or vector. In the latter, when a taught input pattern is identified at 
the input, its relayed output updates the current output. If the input pattern does not 
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belong in the taught list of input vector or patterns, the firing rule is used to determine 
whether to fire or not. Is important to state that the firing rule is related to all the input 
patterns, not only the ones on which the neuron was trained. 

D. ARCHITECTURE OF NEURAL NETWORKS 

1, Single Layer Networks (SLN) 

The structure of an artificial ANN consists of the ‘input layer’ connected to the 
‘hidden layer’, which is connected to the ‘output layer.’ Any action of the input units 
corresponds to the raw data that is fed into the network, while the action of each hidden 
unit is determined by the behavior of the input units and the weights on the links between 
the input and the hidden units (Ripley, B., 1996). The status of the output units depends 
on the activity of the hidden units and the weights between the hidden and output units 
(Figure 5). 



Figure 5. Single Layer Network 

This simple class of network is especially interesting because the hidden units are 
free to build their own versions of the input. The weights between the input and hidden 
units decide when each hidden unit is active, and so by adjusting these weights, a hidden 
unit can choose what it represents. 

2, Multi Layer Networks (MLN) 

The most frequent ANN model is the multilayer perceptron (MLP). This family of 
ANN is known as a supervised network because it needs a preferred output for learning 
purposes (Kartalopoulos, S., 1996). The goal of this network is to generate a model that 
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correctly matches the input to the output using chronological data so that the model can 
then be used to create the output when the desired output is not known. A graphical 
demonstration of an MLP is shown below (Figure 6). 



Input First Second Output 

Layer Hidden Hidden Layer 

Layer Layer 

Figure 6. Multi Layer Network 

3, Feed -Forward Networks (FFN) 

Feed-forward ANNs are the most accepted and extensively used models in several 
realistic applications. FFNs permit signals to pass through one way only: from input to 
output. There is no feedback or loop and they tend to be straight-forward networks that 
connect inputs with outputs. They are broadly used in pattern recognition. This structure 
of organization is also referred to as bottom-up or top-down. (Figure 7) 

4, Radial Basis Function Networks (RBFN) 

The input of a Radial Basis Function (RBF) network is nonlinear while the output 
is linear. Because of their nonlinear approximation properties, RBF networks are capable 
of modeling complex mappings, which perceptron ANNs can only model by using 
multiple intermediary layers (Bishop, C., 1995). To use an RBF network we need to state 
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the hidden unit activation function, the quantity of processing units, a criterion for 
modeling a given task, and a training algorithm for finding the parameters of the network. 



Figure 7. Feed Forward and RBF Network Representation 

E. LEARNING PROCESS 

One of the most essential phases of ANN is the learning process. Learning can be 
done in a supervised or unsupervised way. 

1, Supervised Learning 

In supervised learning, both the inputs and the outputs are known. During the 
training, the net runs the inputs and compares its resulting outputs against the known 
outputs. The differences of this comparison (the errors) are calculated, and the system 
adjusts the weights that manage the network. The aim is to establish a set of weights that 
minimizes the error. One famous method, which is frequent to many learning procedures, 
is the Least Mean Square (LMS) convergence (Duda, R., 2000). In this case, the network 
learns “offline” because the learning and the operation stages are different. Supervised 
learning can be subdivided into the following three general types: 
a. Hebbian Learning 

Hebbian learning is based on the premise that those connections that 
receive the most signal energy should in turn be strengthened. In this type of ANN, 
connection weights increase in a manner proportional to the magnitude of the signals 
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provided that both the input through the path and the desired output are high. While 
historically important and neurologically accurate, it is not widely used in neural 
computing applications. 

b. Delta Rule Learning 

Today, the most frequent form of learning in use is the delta rule. Here, 
weights are adjusted based on a direct comparison between the actual and desired 
outputs. Back propagation is one learning rule based on the generalized delta rule: 

( 2 . 1 ) 

where is the weight of the connection from the element in the current layer to the 
f’' element of the previous layer; Cj,C 2 ,andC 3 are coefficients varying from 0 to 1; 
E.j is the error proportional to the difference between the actual and desired output of the 
network; My is the momentum term based on the difference between the previous weight 
of the given connection and the weight immediately prior to that; and Xy is the 

activation energy associated with that particular connection (Ripley, B., 1996). 

c. Competitive Learning 

Competitive learning is where the output of processing elements is 
weighted according to the magnitude of its response relative to those of other processing 
elements. The “winning” processing element weighting is then modified according to 
comparison between actual and desired outputs. Thus only the strongest activation 
energies are adjusted; weak signals get progressively weaker unless the magnitudes of 
their response become comparable to those of the “winners”. 

2. Unsupervised Learning 

In unsupervised training, the network is provided only with inputs and not with 
preferred outputs. In the training phase, an input pattern is applied to the input layer and 
the net is permitted to achieve equilibrium (“winner”). Thereby, weight changes are made 
according to some instructions. The model itself should then decide what features it will 
use to cluster the input data. This type of organization is also known self-organized or 
adoption. Here the network learns “online,” because it learns and operates at the same 
time. 
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3, Activation Functions 

The activation function is basically used to introduce nonlinearity to the net. The 
activation function g (•) transforms the presented input of an artificial neuron during its 
activation, and determines how influential should be the output from the neuron, based on 
the sum of the inputs. If the artificial neuron must mimic a biological neuron, the 
activation function g(») has to be a simple threshold function returning binary values. 
However, this is not always the approach that artificial neurons implement. Sometimes it 
is more powerful and efficient to have a smooth differentiable activation function. The 
output from this group of activation functions lies within the ranges of [0,1] or [-1,1], 
depending on which activation function is applied. Some cases, where the identity 
function is used as the activation function, do not have these restrictions. On while, inputs 
and weights have no boundaries and take values within the i? range (- 00 ,- 1 - 00 ), in 
practice they often have small values centered around zero or are rescaled to have such 
small values. 

As pointed out before, there are many different activation functions, but the most 
frequently used are the identity function (Figure 8), the sigmoid (Figure 9), and the 
hyperbolic tangent (Figure 10). It is obvious that all the functions should be differentiable 
because the back propagation (BP) algorithm requires this property in order to work out 
the data process within the network (Bishop, C., 1995). 

An identity function, also known as an identity map or an identity transformation, 
is a function which does not have any effect. It always returns the same value that was 
used as its argument. For ANNs, that is reflected by the absence of hidden layers 
(perceptron). 
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Figure 8. Identity Function 

A sigmoid function, also known as logistic, is an S-shaped curve that maps all 
input to the range [0, 1]. It has a limit of 0 as x approaches negative infinity, and 1 as x 
approaches infinity. 



Figure 9. Sigmoid Function 

A hyperbolic tangent function is analogous to a sigmoid, but it maps all of its 
input to the range [-1, 1]. It has a limit of -1 as x approaches negative infinity and 1 as x 
approaches infinity. The constants a and b define the possible output range (a) as well as 
the slope (b). The neuron transforms the stimulus in a nonlinear way, an essential 
precondition for solving a variety of highly complex problems. 
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Figure 10. Hyperbolic Tangent Function 


4, Gradient Descent 

As clarified previously, in training, a function estimator frequently decreases to 

result in a value of w that minimizes a scalar error function£'( w). This is a typical 
optimization issue and various methods have been developed to answer it. The most 
frequent one is gradient descent. Gradient descent consists in thinking that E is the height 
of a landscape on the weight space: to locate a minimum, beginning from an arbitrary 
point, march descending until a minimum is reached (Figure 11) (Sobajic D., 1993). 
From the figure we can see that gradient descent will not at all times converge to an 
absolute minimum of E, but only to a local minimum. In the majority of these cases, this 

local minimum is good enough, given a realistic initial value of w. 


E 



w 


Eigure IE Gradient Descent. 
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5, Back propagation Algorithm 

Back propagation (BP) is one of the earliest training algorithms, first developed 
by P. Werbos and is widely used for training supervised networks. The goal of BP is to 
minimize the square error of the predictions over all the observations (Fausett, L., 1994). 
Based on this algorithm, the output error is assumed to be collectively contributed by all 
connection weights. Basically, at the center of the algorithm, an application of the chain 
rule for ordered partial derivatives takes place to compute the sensitivity that a cost 
function has with respect to the environment and weights of the net. 

Initial weights are usually chosen as small random values, so that each neuron 
will adapt a different set of weights. The network’s input to a node 7 is resolved by 

summing the weights of its inputs 




( 2 . 2 ) 


where v, designates the input in one node, and w, j the weight from node i to node j . 
Next, the node’s threshold valued (bias) is added to net’s input Zj and the calculated 
value is filtered through an activation function, usually a sigmoid function (Figure 12): 


F(z^) = 


a -Z j \ 

+ e ' ') 


(2.3) 



Figure 12. Sigmoid Function 
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The sigmoid function is also known as a ’’squashing” function because it maps its 
inputs on a preset range number between 0 and 1. 

The learning procedure in a BP net has two stages. At the first stage, each input 
pattern 1 ^ is provided to the net sequentially, and propagated forward until the output. In 

the second stage, a technique called “gradient descent” is applied to minimize the total 
error on the input patterns within the training set. During this technique, weights are 
altered in proportion to the negative of an error derivative with respect to each weight 

dF 

] (2.4) 

^j,i 

Then, the weights moving toward the steepest descent of the error surface, which 
is defined by the total error 

(«) 

where j denotes as the responding output of node j to pattern p , and j is the target 
output for node j . After the error on each pattern is calculated (1.2), all the weights are 
readjusted in proportion to this error, and back-propagated from the outputs to the inputs, 
applying the gradient descent method. The new calculated weights decrease the overall 
error in the net. The idea of gradient descent using only a single weight is presented in the 
following picture. 


Error 



Figure 13. Gradient Descent Using One Weight 
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An application of the chain rule is used to develop the BP learning rule, while 
reworking the error gradient for each pattern as the product of two partial derivatives. 
The first partial derivative represents the change in error as a function of the network 
mpuidE^Idw.., while the second partial derivative represents the effect of a weight 

change on a change in the network input dz^ j / dwj ■. Modifying the error gradient turns 
into 




dE^ 

= [^]*[ 




dw, 


( 2 . 6 ) 


Using the equation (2.2) for the net input Zj to a node j, we can solve directly for 
the second partial derivative and derive the network’s output ;for the pattern p to the 
node i : 


dz 




PJ 


dwi 


= [- 


dW: 


-] = ^P.i 


(2.7) 


Naming the negative of the first partial derivative as the error signal: 

dz„ 


( 2 . 8 ) 


the corresponding change in the weight w, j with respect to the error E^ becomes 






(2.9) 


where rj is a parameter describing the learning rate. The speed and accuracy of the 
learning process during the iterations to update the weights also depend on the learning 
rate rj. A low learning rate can guarantee more stable convergence, but a high learning 
rate can accelerate convergence in several cases. 

The next step in the BP algorithm is to calculate the j for each node in the net. 
The equation (1.8) can be rewritten as: 
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( 2 . 10 ) 


dE do 


do 


p,j 


'dz 


pj 


To compute the first partial derivative there are two cases to examine: 
a. First Case 

Assume that j is an output node of the net; then, from equation (1.5) it 

follows that: 


dE^ 

- ^- = 2(t -o .) 


( 2 . 11 ) 


PJ 


substitute equation ( 1 . 11 ) to equation ( 1 . 8 ) becomes: 


( 2 . 12 ) 


PJ ^ PJ PJ' '' ^ PJ' 

b. Second Case 

Assume that is not an 7 output node of the net, then applying again the 
chain rule, we obtain: 

dE ^ dE dz u 

- - = y[ —= 

PJ P^^ P'J 

QE 

k u 


(2.13) 


=Z‘‘ 


p,k^k,j 


Combining now the above cases, we form an iterated process for 
calculating the signal error j for all nodes in the net. These errors can then be used to 

update its weights. 

As BP uses a gradient descent method, the corresponding net tracks the 
contour of an error surface with weight updates moving in the direction of the steepest 
descent. Assuming a plain net without hidden layers, it is easy to minimize the error 
using gradient descent because the error surface is bowlshaped (Figure 13). The net will 

always locate an optimal solution at the base of the bowl. Such optimal solutions are 
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called global minima. On the other hand, cases which are more complex require the 
existence of an extra hidden layer to carry out the solution to such difficult problems. 
Here, error surfaces become also complex, containing possibly many minima. Because 
some minima are deeper than others, it is possible that gradient descent will not locate a 
global minimum, and the network may be trapped in a local minimum, which is a 
suboptimal solution. 

It is clear that we want to avoid local minima while training a BP net. 
Although in some case this may be difficult to do, in practice it is essential to try to find 
how often and under what circumstances local minima occur. Moreover we have to study 
possible approaches for avoiding them. It is known in the ANNs theory that the more 
hidden layers you have in a net, the less possible you meet a local minimum during 
training. Although additional hidden nodes amplify the complexity of the error surface, 
the extra dimensionality increases the number of possible flee paths. 

The BP algorithm analyzed in this chapter only involves only weight 
changes that are proportional to the derivative of the error. As we mentioned before at 
equation (1.9), the increment of the learning rate contributes rj to an increment of the 
weight changes on each iteration, and the faster the net learns (although the magnitude of 
the learning rate can also control whether the net reaches a stable solution). If the learning 
rate gets too large, then the weight changes no longer approximate a gradient descent 
procedure and that often results in oscillation of the weights. Obviously, we want to get 
the largest learning rate without causing oscillation, achieving the best learning speed 
while minimizing the training time for the net. One technique that has been used is a 
small modification of the BP algorithm that contains a momentum term. 

The idea of momentum is that previous changes in the weights should 
control the current direction of movement in the weight space. This idea is implemented 
by the adjusted weight update rule: 

^w.. {ri + \) = s* + a^w.. {rj) (2.14) 

where rj is the learning rate. With momentum, if the weights begin to move in a specific 
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direction in their space, they tend to keep on moving in that direction. Momentum can aid 
the net to overcome a local minimum, in addition to speeding learning, particularly along 
extensive flat error surfaces. 

In Clementine, the default learning rate is 0.25 and the default momentum 
parameter is 0.9. When using BP for a series of problems, much smaller values than these 
are often used. For especially complex problems, a learning rate of 0.01 is very common. 

6. Efficient Algorithms 

Selecting the proper value for the learning rate rj is not an easy concept. If rj has 
a small value, then the learning procedure will be too slow. On the other hand, if r] is 
assigned a large value, the learning procedure may diverge. An acceptable value of rj can 
be established by trial and error, but this is a quite boring and wasteful process (Bishop, 
C., 1995). In order to deal with this problem, a large range of efficient learning methods 
has been developed. Here, we briefly present the most essential theoretical ideas 
underlying them. One of the most basic ideas for speeding up the learning procedure is to 
use the second-order information regarding the error function. Assuming a quadratic error 
in one dimension, the best learning rate is the inverse of the second order derivative 
(Figure 14), which can aid in designing capable learning methods. 
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(a) r)< 1/A 


(b) r, = 1/A 




Figure 14. Learning Rate effect on Gradient Descent (From: Fausett, L., 1994) 

If it is possible to calculate this second-order derivative, then it is feasible to 
achieve a good learning rate. Unhappily, the error function might not be quadratic at all. 
Therefore, setting the learning coefficient to the inverse of the second-order derivative 
only works near the optimum, in regions where the quadratic approximation is valid. But 
if the second-order derivative is negative, this does not work. For handling such 
circumstances, particular care must be taken. Moreover, a few other issues come up when 
the dimension of the weight space is bigger than one, which is the most common case in 
practice. The second order derivative is not a single number anymore, but a matrix called 
the Hessian and defined as: 
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dw„dw^ 

] 


(2.15) 


Sometimes it is probable to have dissimilar curvatures in different directions 
(Figure 15). This can generate a major problem if there is, for instance, a second 
derivative of one hundred (100) in one direction, and a second derivative of one (1) in 
another. In this case, the learning coefficient must be less than 0.002 in order to avoid 
divergence. This means that convergence will be very slow in the direction where the 
second derivative is one (1). This phenomenon is called ‘ill conditioning.’ Efficient 
algorithms frequently try to change the weight space in order to have the same curvatures 
in all directions. This has to be done cautiously so that instances where the curvature is 
negative work as well. Some of the best algorithms are QuickProp, Levenberg 
Marquardt, and Conjugate Gradient. 
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Figure 15. Ill Conditioning. (From: Bishop, C., 1995) 
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7, Batch Vs Incremental Learning 

In unsupervised learning, as described earlier, the error function is frequently 
defined as a sum of error terms over a finite amount of training samples that consist of 
pair vectors (input, output). Again, the error function is: 

E = Y,E, (2.16) 

i 

with 

( 2 - 17 ) 

Applying the steepest descent on E is known as “batch learning” (BL) for the 
reason that the gradient of the error has to be calculated on the full training set previous 
to weights modification. A different method to modify weights aiming to minimize E is 
“incremental learning” (IE). Within this method, the gradient descent steps are applied on 
El instead of E. Some other known names for IE are online or stochastic learning. Now 

the obvious question that arises is which of these methods is the best. The answer is not 
so simple and depends always on the particular problem to be solved. Here are a few of 
the tips to consider (Simpson P., 1996): 

a. Advantages of Incremental Learning (IL) 

• IL is usually faster, particularly when the training set is redundant. In 
situations where the training set has input and output patterns that are 
similar, BL wastes time calculating and adding similar gradients before 
setting one weight update. 

• IL often results in better outcomes. This happens because the randomness 
of IL generates noise in the weight updates. This noise aids weights to 
jump out of bad local optima. 

• IL is capable of tracking changes. As an example, consider a learning 
model of the dynamics of a mechanical system. While this system gets 
older, its properties might slowly evolve and IL can track this type of drift. 

b. Advantages of Batch Learning (BL) 

• In IL, noise causes the weights to continuously oscillate around a local 
optimum, and they never converge to a constant stable value. This is not 
the case in BL, making it easier to analyze. 
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• Various acceleration methods can operate only in BL, such as some of the 
algorithms mentioned earlier (QuickProp, Conjugate Gradient). 

• Another benefit related to the absence of noise in BL is that the theoretical 
analysis of the weight dynamics and convergence rates are very simple. 
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III. DATA DESCRIPTION AND METHODOLOGY 


Vibration analysis is among the most powerful tools available for the detection 
and isolation of incipient faults in mechanical systems. Among the methods of vibration 
analysis in use today and under continuous study are broadband vibration monitoring, 
time-domain analysis, and frequency analysis. All have varying degrees of utility in 
machinery condition monitoring and diagnostics and all have characteristics that lend 
themselves particularly well to specific applications. Since the effectiveness of ANN is 
directly related to how effectively the chosen inputs define a particular decision space, 
the selection of the optimum vibration parameters for inputs to the ANNs is critical. 
Thus, a good understanding of elementary machinery diagnostics techniques is essential. 
A, SOURCES OF VIBRATION 

In mechanical systems, any mechanical component which periodically comes in 
contact with a second component to transmit an axial, radial, or torsional load is a 
potential source of mechanical vibration. In machines with a gear train, the principal 
components involved with load transfer will be its torsional power source, such as a 
motor; the gear meshes; the bearings; and those items that interconnect them, the shafts. 
Additionally, because vibrational isolation is seldom complete, additional extraneous 
sources of vibration will also be present. The diagnostician is generally interested in 
extracting the vibrations created by specific machinery components and ignoring the 
other sources as extraneous noise. In this study, we are particularly interested in the 
vibrations generated by the rotating machinery’s gears, bearings, and shafts. As such, the 
discussion will be limited to these sources of vibration. 

1. Gear Vibration 

In a gear train, the gear mesh is the dominating source of mechanical vibration. 
This vibration primarily stems from the non-uniformity in the transmission of angular 
motion from one gear to its mate. The non-uniformity of the angular motion occurs due to 
geometric deviations of the contact surfaces from the ideal involutes shape and the elastic 
deformation that any mechanical system undergoes when transmitting a load (Mark, 
W.D., 1998). Moreover, torque fluctuations and deflections of the gearbox can also be 
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sources of vibration in gears. Clearly, any damage that occurs to the gear contact surface, 
as well as other mechanical linkages to the gear mesh, will also have an effect on the 
gear’s vibration (Mattew, J., and Alfredson, R.J., 1987). 

2. Bearings 

Bearing vibrations occur for much the same reasons as gears. However, because 
bearings are not situated directly along the power transmission train and support largely 
static loads, they characteristically generate a small vibration signal until the damage 
inflicted upon them reaches advanced stages. Because of the low magnitude of these 
signals, they are often masked by much stronger gear-related signals. Partially because of 
this belated detection of trouble, antifriction bearings are among the most common causes 
of machinery failure in moderately sized machines. The frequencies associated with the 
bearing-related signals generally depend on the location of the damage, the dimensions of 
the bearings, and the shaft rotation speed. 

3, Shafts 

Shafts generally produce vibration signals at their rotational frequency and its 
harmonics. Shafts are also prone to a number of different faults, all of which register at 
the shaft rotative frequency. In the case of bent shafts and shafts misalignments, the 
second harmonic is the dominant frequency in 90 percent of the cases (Collacott, R.A., 
1979). Imbalances in the shaft or load characteristically generate a dominant signal at the 
shaft rotative frequency but there tends to be a phase shift as well. Mechanical looseness 
can also introduce increases in the shaft rotational frequency but also characteristically 
involves higher harmonics as well (Hewlett Packard, 1983). 

B, DATA COLLECTION 

The database used in this research was provided by Goodrich Corporation and 
collected through the IMD-HUMS installed on 30 UH-60L helicopters. The period of 
data collection starts at 9/22/2003 and stops at 7/31/2005. The total number of 
observations utilized was 36,742 and each observation consists of 169 fields. For the 
database’s structure and reference ease, each generator was assigned a number starting 
from 1 up to 66. Appendix A summarizes the allocation of generators among the 
helicopters and the number of the recorded observations for each generator. 
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C. SELECTING VARIABLES 

As in any prediction or forecasting model, the selection of appropriate model 
inputs is extremely important. However, in most ANN applications, less attention is 
given to this task. The main reason for this is that ANNs belong to the class of data- 
driven approaches, whereas conventional statistical methods are model-driven. In the 
latter, the structure of the model has to be determined first, which is done with the aid of 
empirical or analytical approaches, before the unknown model parameters can be 
estimated. Data-driven approaches, on the other hand, have the ability to determine which 
model inputs are critical, so there is less need for “...a priori rationalization about 
relationships between variables...” However, presenting a large number of inputs to ANN 
models and relying on the network to determine the critical model inputs usually 
increases network size. This has a number of disadvantages, such as decreasing 
processing speed; increasing the amount of data required to estimate the connection 
weights efficiently, and degrading the AAN performance. This is particularly true for 
complex problems, where the number of potential inputs is large and where no a priori 
knowledge is available to suggest which inputs to include. 

1, Input Vector 

According the vibration theory mentioned before and after an assiduous study of 
the entire data recorded by the IMD-HUMS, the variables which potentially could form a 
well-informed input vector—all of them or subsets—to the training process of this 
research model, are shown and briefly explained below (Goodrich Corporation, 1998): 

a. Torque 

Torque is a measure of how much force acting on an object causes that 
object to rotate. 

b. SO_1 (Shaft Order 1) 

SO_l is the once-per-revolution energy in the signal average and is used 
to detect shaft imbalance. 

c. SO_2 (Shaft Order 2) 

SO_2 is the twice-per-revolution energy in the signal average and is used 
to detect shaft misalignment. 
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d. SO_3 

S0_3 is the thrice-per-revolution energy in the signal average and is used 
to detect shaft disparity. 

e. Signal Average RMS 

Frequencies that are integer multiples of the basic shaft frequency will be 
enhanced by the averaging process and other frequencies will be relatively attenuated. 
Depending on the mechanical environment, the signal average generally requires about 
100 revolutions to converge to a usable waveform. 

/. Residual Kurtosis 

The residual analysis first removes all the strong tones from the signal 
average to produce a residual signal so as to minimize the interference of these strong 
tones. Here the process handles kurtosis, which measures the thickness of the tails of the 
distribution of bearing vibrations after the background signal has been removed (Harris, 
2002 ). 

g. Residual RMS 

The residual process deals with the Root Mean Square, which is the 
overall energy level of the vibration data. 

h. Side Band Modulation_l 

This analysis is designed to reveal any sideband activities that may be the 
result of certain gear faults, such as eccentricity, misalignment, or looseness. The 
indicator characterizes the degree of sideband modulation for the first sideband. 

i. Gear Distributed Fault 

This attribute is an effective detector for distributed gear faults, like wear 
and multiple tooth cracks. It is a dimensionless measurement calculated from the ratio of 
explained and unexplained variances of a vibration generated at the meshing of gears. 

7. G2_1 

G2_l is an algorithm developed by Goodrich Corporation to compute the 
ratio of the signal average peak to peak and the gear meshing energies. 

k. Residual Peak to Peak 

The residual process deals with the algebraic difference between the 
extremes of the vibration quantity. 
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/. Gear Misalignment_1 

Gear Misalignment_l is a dimensionless measurement resulting from the 
ratio of the energies of the vibrations produced when gears mesh (Harris, 2002). 

m. Ball Energy 

Ball Energy is the total energy associated with the bearing ball spin defect 
frequency and its harmonics. 

n. Cage Energy 

Cage Energy is the total energy associated with the bearing cage defect 
frequency and its harmonics. Usually it is detectable only at the later stage of a bearing 
failure, but some studies show that this indicator may increase before the others. 

o. Inner Race Energy 

Inner Race Energy is the total energy associated with the bearing inner 
race defect frequency and its harmonics. 

p. Outer Race Energy 

Outer Race Energy is the total energy associated with the bearing outer 
race defect frequency and its harmonics. 

q. Envelope RMS 

The main purpose of envelope analysis is to sum and normalize the six 
multiples of frequencies above or below the Root Mean Square value of the vibration. 

Table 1 enumerates the above potential predictors for later reference purposes. 


Predictors 

1 

Torque 

10 

G2_l 

2 

SOI 

11 

Residual Peak to Peak 

3 

S02 

12 

Gear Misalignment_l 

4 

S03 

13 

Ball Energy 

5 

Signal Average RMS 

14 

Cage Energy 

6 

Residual Kurtosis 

15 

Inner Race Energy 

7 

Residual RMS 

16 

Inner Race Energy 

8 

Side Band Modulation_l 

17 

Envelope RMS 

9 

Gear Distributed Fault 




Table 1. Potential Model Predictors 
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2, Output Vector 

In this research’s models, the output vector consists only of one variable, which is 
the operating condition of the generator. Each generator is assigned a binary value of 1 or 
0 to classify its known state. The value of 1 corresponds to generators removed for fault 
while the value of 0 corresponds to good generators. The fact that each generator is 
assigned a state of 0 (good) or 1 (bad) does not mean these generators are actually in the 
assigned state. The given state of 0 (good) or 1 (bad) is based only upon whether a 
generator was removed for fault or not, according the maintenance records. A generator 
with a hidden fault would be assigned a state of 0 (good). Similarly, a generator which 
was taken out for a malfunction and given a state of 1 (bad) could have been 
mechanically good (Willard, L., Klesch, G., 2005). Unlikely the previous work of 
Willard and Klesch (2005), the entire history of “good” and “bad” generators is used 
here. Thus most “bad” generators should have an initial period of “good” following by 
“bad” as they fail. 

The following table shows the generators that were confirmed as bad, either after 
maintenance or based on IMD-HUMS data, and assigned the value of 1 (Table 2). The 
failure of two of the generators, numbers 9 and 33, were detected during operation by a 
generator warning light. Faults in the remaining four generators, numbers 22, 31, 53, and 
56, did not trigger the generator warning light. However each of the four generators had 
unusually high SOI readings upon removal. Three of these generators, numbers 22, 31, 
and 53, showed evidence of fault or wear. The removal of generator number 56 resulted 
from the case of an identifiable buzz (Willard, L., Klesch, G., 2005). 
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Bad Generators—Reasons for Replacement 

Generator 

Comments 

9 

Generator failed during shutdown upon APU generator coming on during start. 

22 

SOI near 2 ips. After Spline Adapter Coupler replacement, SOI returned to 0.05 ips. 

31 

SOI at 3 ips. After Spline Adapter Coupler replacement, SOI remained high and so 
generator was replaced 

33 

Generator bad 

53 

SOI at 3 ips. After Spline Adapter Coupler replacement SOI returned below 0.05 ips. 

56 

SOI over 4 ips. Generator and Spline Adapter Coupler were replaced. 

Note: 

Replacement of generators 9 & 22 were made due to maintenance records while the rest due 
to Johnny Wright and Ground Station Team, IMD-HUMS Fault Detections, Goodrich 
Corporation. Draft 5/25/2005 (Ver 117) 


Table 2. Bad Generators—Reasons for Replacement 


D, DATA PREPROCESSING 

Data preprocessing is frequently used to analyze and transform the input and 
output variables to minimize noise, emphasize essential relationships, identify trends, and 
flatten the distribution of the variables to aid the ANN in learning the relevant patterns. 
Since ANNs are pattern matchers, the representation of the data is important in designing 
a successful network. In most datasets there is a large variability in the scale of range 
fields. To balance this effect of scale, range fields are transformed so that they all have 
the same scale. In Clementine, range fields are rescaled to have values between 0 and 1. 
The transformation used is: 


X.. - X. 


X, 


X. 


(2.18) 


where x. is the rescaled value of input field x for record i, v, is the original value of x 
for record i, is the minimum value of x for all records, and is the maximum 
value of X for all records. 

An additional problem for network representation of the utilized database was 
that, out of 36,742 total observations, only 1,477 cases referred to the bad generators. 
This fact directly affects the learning procedure of the network by creating a tendency to 
predict only good generators. For example. Table 3 gives results for a A AN using the 
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default settings in Clementine where the classification is perfect for “good” generators 
while the classification for the “bad” generators is that actually they are “good” too. 
Changing the ANN architecture consistently gives similar results; most observations are 
predicted as “good.” To correct this situation, the records of bad generators were 
replicated so that the ratio of good to bad generators was close to 1. We can consider this 
scheme as a weighting technique to emphasize the input vector of the bad generators and, 
concurrently, the information they might convey. The multiplication factors differed for 
each generator. They chose so that each bad generator had about the same number of total 
observations and so that the number of bad generators was about equal to the number of 
good generators. Table 4 describes the latter procedure of data manipulation, and Figure 
16 presents the setup of Clementine to create the new normalized database. 


ORIGINAL 

ALL DATA 

TRAINING 

PREDICTION ACCURACY 

SET 

W/0 GEN 

ACCURACY 

BAD 

GOOD 


9 

98.479 % 




22 

98.173 % 



36742 

31 

97.111 % 

0% 

100% 

observations 

33 

97.657 % 


53 

97.151 % 




56 

96.928 % 




Table 3. Training Set using only Original Data 
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Generator 

Observations 

Factor 

Result 

Total 

Ratio 


9 

435 

13 

5655 




22 

336 

16 

5376 



fl 

<u 

O 

31 

302 

18 

5436 

32737 


-o 

CQ 

33 

245 

22 

5390 

0.93 


53 

61 

55 

5390 




56 

98 

90 

5490 



Good Gen 

All 

35265 



Table 4. Data Multiplication of Bad Observations 



Q Server Local Server 


Figure 16. Clementine Preprocessing Data 
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E. DATA SETS 

In ANNs, it is a common practice to partition the database into three separate sets 
called the training, test, and validation sets. 

1, Training Set 

The training set is the largest set and is used by the ANN to learn the patterns that 
exist in the data. The training set used for this research consists of the observations of 
five (5) bad and fifty-six (56) good generators, for a total of about 60,000 records. 

2, Test Sets 

The test sets, varying in size from 10% to 30% of the training set, are used to 
evaluate the ability of a trained ANN to generalize to a new set of data. The researcher 
fits the parameters and the topology of the network that achieves the best results on a test 
set. Using the features of the software, the size of a test set was always taken to be at 25% 
of the training set. Periodically during the process of training, new test sets are selected 
from the training set. Although this sample was selected randomly from the training set, 
the same seed of (12345) was used, in order to duplicate results for different model fits. 
Figure 17 shows the format chosen for the training and testing sets. 



Figure 17. Model Architecture GUI 
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For the purposes of training, each record is considered as an observation. Thus it 
is likely that the test set contains records from each of the 61 generators. 

3, Validation Set 

The validation set is used last as a final check on the performance of the trained 
network. The size of the validation set should keep a balance between obtaining a 
sufficient sample size to evaluate a trained network and having enough remaining 
observations for both training and testing. The validation set consists of one (1) bad and 
four (4) good generators. The records from these five generators are not used in training 
the network. They are only used for validation. 

F. NETWORK ARCHITECTURE AND EVALUATION CRITERIA 

As a baseline, this analysis uses a model architecture consisting of one input layer 
containing the predictors for the model, one output layer giving the estimate of the 
probability that the record is bad, and one hidden layer. Clementine provides six training 
methods for building ANN models but for this research utilized only the following two: 
(Figure 17) 

• Prune: This method starts with a large network and removes (prunes) the 
weakest units in the hidden and input layers as training proceeds. This 
method is usually slow, but it often yields better results than other 
methods. 

• RBFN: The radial basis function network uses a technique similar to k- 
means clustering to partition the data based on values of the target field. 

Clementine incorporates several features to avoid some of the common pitfalls of 
ANNs, including sensitivity analysis, network accuracy, and feedback graph. With these 
options selected, a sensitivity analysis will provide information on which input fields are 
most important in predicting the output field, network accuracy will provide the 
percentage of records for which the prediction of the model matches the observed value 
in the data, and the feedback graph will depict the accuracy of the network over time as it 
learns (Clementine 10.0 Node Reference, 1999). Moreover, a Confidence level ($N- 
Binary) is provided for each observation after the prediction, which, in this model with a 
flag (0 or 1) output, is calculated by using the formula: 

$N-Binary = 210.5 - Raw Output I (2.19) 

where the RawOutput is the output unit value scaled so that it is between o and 1. 
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If the output unit value is below 0.5, the observation is predicted as 0 (false), and 
if it is 0.5 or above, the observation is predicted as 1 (true). For example, if the ANN 
prediction value is 0.72, the prediction is displayed as “true,” and the confidence will be 
2|0.5-0.72| = 0.44. A part of some output results is presented in Figure 18. 



j_ 0 1 0.710 15.971 2.065 0.262 0.078 5.276 

2 _ 0 1 0.753 16.062 1.967 0.217 0.051 5.063 

3 _ 0 1 0.748 16.309 1.992 0.215 0.050 5.132 

4 _ 0 1 0.785 16.370 2.128 0.263 0.078 5.435 

5 _ 0 1 0.633 16.413 1.791 0.196 0.046 4.622 

R n 1 n me icRfic i 7nn moo nnm ^ roo 


Figure 18. Clementine Prediction Table 
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IV. RESULTS AND DISCUSSION 


A, MODEL WITH ALL PREDICTORS 

The first attempt of this experiment was to predict the state of one (1) bad and 
four (4) good generators. The good generators were chosen randomly and form a constant 
group for the rest of the experiment. The bad generators were sequentially chosen, one at 
a time, in order to observe how the model reacts and how this might affect the learning 
procedure. Initially, all the predictors were used to form the input vector for the training 
procedure of the network (Table 1). 

Appendix B provides the structure of the models in Tables 4, 5 and 6 along with 
their sensitivity tables. The training sets used for those models consist of the observations 
of five (5) bad and fifty-six (56) good generators. The size of the test sets was always 
taken to be at 25% of the training set while the validation set consists of one (1) bad and 
four (4) good generators. 

It is obvious from Table 4 that all the models behave well through the learning 
process and their training accuracy, computed on the test set taken from the training set, 
is very high (above 99%). Moreover, the prediction accuracy of the models, for the group 
of the good generators in the validation set, is also high (above 86%). On the other hand, 
it is noticeable that four (4) models could not predict the bad generator in the validation 
set at all, or to be more specific, they predicted that the generator was good. These 
models are even unable to predict generators 9 and 33, which are certainly bad because 
they had been replaced due to total failure and not preventively. The most likely 
explanation is that the patterns for the bad generators are too close to those to the good 
ones. It is also likely that the patterns of predictors for bad generators differ among the 
bad generators. There are a variety of potential causes for these differences. It is plausible 
that variability among aircrafts, among generators, or even among placement of IMD- 
HUMS acidometers could cause differences in readings from generator to generator. We 
also expect that bad generators with different failure causes will have different vibration 
patterns. Projection pursuit implemented by the statistical software Ggobi, used to gain a 
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visual perspective of the relationship among the variables. Ggobi plots two-dimensional 
projections of multi-dimensional data and Figure 19 depicts some of the data leading to 
the assumption of a unique failure type for each “bad” generator. 



Figure 19. Ggobi screen for “bad” generators 


Attempting to explore and understand further the models’ behavior, we tested five 
(5) new models to predict the condition of the two generators, 31 and 53, which were 
well predicted from models 3 and 5 of Table 5. Each model includes the whole database 
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excluding those data of the validation set which are the “bad” generators in question and 
the “good” generators 4, 26, 42 and 66. For these models, we exclude from the training 
set the data of the pair referenced in table. 

The prediction accuracy for generator 53 drops from 78.69 % in the single model 
5, to a range of 77.05 % - 59.02 % in the paired models. This seems to indicate that all 
the bad generators contribute to the prediction (Table 6). 

In contrast, for generator 31, the accuracy of models 8 and 9 drops less than 10% 
and for model 10 is almost 0%. This might indicate that generator 31 is more closely 
related to those generators that form the corresponding pairs (Table 7). Once again, we 
notice that the previous assumptions of close bad and good generator patterns and the 
possibility of unique reasons for generator failures are valid. Additionally, we observe 
that any change made on the models didn’t affect the accuracy of good generators. 
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MODEL 

VALIDATION SET 

TRAINING ACCURACY (%) 

PREDICTION ACCURACY (%) 


BAD 

9 


0 



4 


90.73 

1 

GOOD 

26 

99.913 

96.85 


42 


100 



66 


100 


BAD 

22 


0 



4 


100 

2 

GOOD 

26 

99.721 

95.40 


42 


100 



66 


100 


BAD 

31 


89.74 



4 


100 

3 

GOOD 

26 

99.708 

95.66 


42 


100 



66 


100 


BAD 

33 


0 



4 


100 

4 

GOOD 

26 

99.821 

99.74 


42 


100 



66 


100 


BAD 

53 


78.69 



4 


100 

5 

GOOD 

26 

99.448 

99.77 


42 


100 



66 


100 


BAD 

56 


0 



4 


100 

6 

GOOD 

26 

99.376 

86.20 


42 


100 



66 


100 


Table 5. Models Predicting Single Generator 
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MODEL 

VALIDATION SET 

TRAINING ACCURACY (%) 

FIRST PREDICTION (%) 

SECOND PREDICTION (%) 


BAD 

53 


78.69 

77.05 


9 


99.31 

0 

7 


4 

89.976 

100 

100 

GOOD 

26 

97.77 

100 


42 


100 

100 



66 


100 

100 


BAD 

53 


78.69 

72.13 


22 


100 

0 

8 


4 

99.707 

100 

100 

GOOD 

26 

97.77 

99.47 


42 


100 

100 



66 


100 

100 


BAD 

53 


78.69 

59.02 


31 


100 

0.33 

9 


4 

99.478 

100 

100 

GOOD 

26 

97.77 

95.66 


42 


100 

100 



66 


100 

100 


BAD 

53 


78.69 

78.69 


33 


99.18 

0 

10 


4 

99.728 

100 

100 

GOOD 

26 

97.77 

99.87 


42 


100 

100 



66 


100 

100 


BAD 

53 


78.69 

70.49 


56 


100 

6.12 

11 


4 

99.640 

100 

100 

GOOD 

26 

97.77 

98.69 


42 


100 

100 



66 


100 

100 


Table 6. Models Predicting Pair of Generators Including 53 
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MODEL 

VALIDATION SET 

TRAINING ACCURACY (%) 

FIRST PREDICTION (%) 

SECOND PREDICTION (%) 


BAD 

31 


89.74 

97.68 


9 


99.54 

0 

12 


4 

99.830 

100 

100 

GOOD 

26 

95.66 

96.85 


42 


100 

100 



66 


100 

100 


BAD 

31 


89.74 

7.62 


22 


100 

0 

13 


4 

99.795 

100 

100 

GOOD 

26 

95.66 

96.98 


42 


100 

100 



66 


100 

100 


BAD 

31 


89.74 

1.99 


33 


100 

0 

14 


4 

99.780 

100 

100 

GOOD 

26 

95.66 

99.87 


42 


100 

10 



66 


100 

100 


BAD 

31 


89.74 

0.33 


53 


100 

59.02 

15 


4 

99.478 

100 

100 

GOOD 

26 

95.66 

95.66 


42 


100 

100 



66 


100 

100 


BAD 

31 


89.74 

82.12 


56 


100 

24.49 

16 


4 

99.509 

100 

100 

GOOD 

26 

95.66 

98.29 


42 


100 

100 



66 


100 

100 


Table 7. Models Predicting Pair of Generators Including 31 
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B, ARTIFICIAL TRAINING SETS 

Leaving out certain bad generators degrades the performance of the network when 
fitting the ANN. If including patterns of predictors for those bad generators is important 
for classification, then there may be other patterns of predictors for bad generators that 
have not yet been observed and are not included in the data set. 

In this section an attempt is made to predict patterns of bad generators not 
included in the training set. First assume that a “bad” generator is any pattern which is 
not like the “good” generators in the data set. With the large number of good generators 
in the data set and the fact that the predictors of the good generators seem to clump 
together in two dimensions, this seems like a reasonable assumption. The ANN classifies 
as “bad” any pattern which is not like the good generators of the training set. To 
accomplish this, artificial records of “bad” generators are constructed and included in the 
training set. These artificial records are constructed by simulating values of their 
prediction variables using uniform distributions with lower and higher limits taken to be 
respectively the minimum and maximum value for that predicted variable. Table 8 gives 
these values for each predictor. 

Uniform random numbers were generated in Excel. With Excel’s limited memory, 
only 65,000 uniform random numbers could be generated at one time. With seventeen 
(17) predictors, limiting the number of these artificial records to 65,000 makes for a 
sparse set of “not good” records. 
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MIN 

MAX 

AVERAGE 

STDV 

1 

Torque 

0.00000000 

107.28300000 

41.81788502 

18.13808013 

2 

S01 

0.00000214 

5.13701866 

0.756183502 

0.573224368 

3 

S02 

0.00004010 

0.50330044 

0.033243415 

0.032356201 

4 

S03 

0.00000232 

0.36395797 

0.024541051 

0.022035132 

5 

SignalAvgRMS 

0.00164739 

22.84940000 

5.789740929 

3.535674759 

6 

Residualkurtosis 

1.49883000 

5.65217000 

2.219889443 

0.493949931 

7 

ResiduaIRMS 

0.00056873 

22.78900000 

4.755598113 

3.765317474 

8 

SideBandModI 

0.01836880 

152.09200000 

0.885233195 

1.609240035 

9 

Geardistributedfault 

0.01041510 

0.99954100 

0.768569289 

0.195927016 

10 

G2 1 

4.22352000 

14511.30000000 

153.974274 

247.6395613 

11 

Residualpeaktopeak 

0.00340415 

69.12200000 

19.35698974 

11.52136988 

12 

GearMisalignmentI 

-61.47450000 

37.17800000 

-16.09491365 

9.546299354 

13 

BallEnergy 

0.00000528 

0.08470770 

0.002668935 

0.001975551 

14 

CageEnergy 

0.00001770 

0.17604000 

0.008915292 

0.009668373 

15 

InnerRaceEnergy 

0.00000187 

0.02487210 

0.001402642 

0.001161872 

16 

OuterRaceEnergy 

0.00000388 

0.05549500 

0.001653416 

0.001667602 

17 

EnvelopeRMS 

0.00151628 

5.00572000 

0.841563316 

0.386477268 


Table 8. Summary Statistics for each Predictor variable including the Minimum, 
Maximum, Average and Standard Deviation. 


The following table (Table 9) presents the results of forecasting generator 9 using 
the above idea of training a uniformly distributed artificial set, combined along with the 
rest of generators regardless of their condition. We can see that propagating the artificial 
training set, even up to 1.3 million (1.3M) observations, the model still has no ability to 
predict the real condition of the bad generator in question. It does while still having 
excellent prediction accuracy on the group of good generators. The model starts to have 
encouraging prediction accuracy (11.72 %) after the artificial set reaches 2.6M 
observations but, at this point, it becomes less effective on the group of good generators. 
As the artificial set reaches a size of 2.75 M observations, it has excellent prediction 
accuracy (100 %) on the generator in question but now the predictions for the good group 
become very poor. This suggests once again that the patterns of good and bad generators 
are closely related. We can also conjecture that, because of the large number of predictor 
variables, we are seeing the so called “curse of dimensionality.” This means that the 
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complexity of the model grows exponentially with the dimension, rapidly outstripping the 
computational and memory storage capabilities of computers. For this data training more 
than 2.75M observations become infeasible. 

The same method is applied to generator 33. The results are analogous and 
summarized in Table 10, while the structures of the networks are provided in Appendix 
C. 


ARTIFICIAL 

ALL DATA 

MODEL 

PREDICTION ACCURACY 

SET 

W/0 GEN 

ACCURACY 

BAD 

GOOD 

65,000 


93.676 % 

0% 


650,000 


98.140% 

0% 

Excellent 

1.30 M 

9 

99.101 % 

0% 


2.60 M 


99.137 % 

11.72% 

Average 

2.75 M 


99.243 

100% 

Poor 


Table 9. Artificial Training Set to Predict Gen 9 


ARTIEICIAL 

SET 

ALL DATA 

W/0 GEN 

MODEL 

ACCURACY 

PREDICTION ACCURACY 

EOR BAD 

EOR GOOD 

65,000 


91.983% 

0% 


650,000 


98.053 % 

0% 


1.30 M 

33 

98.915 % 

0% 

Excellent 

2.60 M 


99.209 % 

0.41% 


2.75 M 


99.537 % 

0.40% 



Table 10. Artificial Training Set to Predict Gen 33 
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C. STEPWISE PREDICTORS USAGE 

A simpler, but sometimes very effective, way of dealing with high-dimensional 
data is to reduce the number of dimensions. At this phase of the research we pursued a 
stepwise approach, similar to that of forward selection used for regression models. First 
were trained separate networks for each input variable. The network achieving the best 
training accuracy is then preserved. The effect of adding each of the rest of the inputs to 
this model in sequence is evaluated. This procedure is repeated for one, two, three, etc., 
predictors until the addition of extra predictors does not result in a significant 
improvement in model performance. Like any process, this approach has several 
disadvantages. The biggest are that it requires substantial computation and that it is also 
unable to capture the importance of certain combinations of predictors that might be 
insignificant on their own. 

This phase of the research was the most time-consuming and many single and 
combined predictors were evaluated, aiming to find the model that better classified all the 
generators of the validation set. In the following table (Table 11), for space and time 
saving reasons, we present only one model as representative of those with the best ability 
to classify correctly. 


MODEL 

TO 

PREDICTORS 

TRAINING 

ACCURACY 

PREDICTION ACCURACY 

BAD 

GOOD 

(4,26,42,66) 

Predict 

Gen9 

ResidualRMS 

SideBandModl 

ResidualPeakToPeak 

BallEnergy 

InnerRaceEnergy 

86.029 % 

39.08% 

>85 % 

Predict 

Gen22 

84.036 % 

26.49 % 

>83 % 

Predict 

Gen31 

87.345 % 

5.36 % 

>72% 

Predict 

Gen33 

85.377 % 

40.82 % 

>90% 
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MODEL 

TO 

PREDICTORS 

TRAINING 

ACCURACY 

PREDICTION ACCURACY 

BAD 

GOOD 

(4,26,42,66) 

Predict 

Gen53 

87.321 % 

36.07 % 

>73 % 

Predict 

Gen56 

86.386% 

2.04 % 

>69 % 


Table 11. Stepwise Good Generated Model 


This last phase comes to verify that no matter which strategy was applied, the 
patterns of good and bad generators are so closely related that no model of the structure 
that we trained could distinguish the different pre-assigned condition of those generators. 
The time that the one model was a good classifier for the bad generator, it failed to 
predict the good ones and via versa; if the model performed well on the good generators, 
it failed to recognize the bad ones. The model summarized in Table 11 achieved the best 
overall performance for the database of this research. Even though the overall prediction 
accuracy appears low for the records of “bad” generators, the pattern of predicted 
probability that a record is “bad”p, as a function of time is very different for “bad” 
generators than for “good” generators. As an example. Figures 20 and 21 give plots of p 
versus time for a “bad” and “good” generator respectively. The “good” generator has 
value of p close to zero, where as the “bad” generator p is vary considerably and show 
an increasing trend in time. Appendix D contains plots of p versus time for the rest of the 
generators in the validation set and sensitivity analysis of this model. 
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Time Sequence Observations 


Figure 20. p from “bad” Generator 9 
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Time Sequence Observations 
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Figure 21. p from “good” Generator 66 
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V. CONCLUSIONS AND RECOMMENDATIONS 


The emphasis in this thesis was to develop an ANN that would utilize the 
collected data from IMD-HUMS, manufactured by Goodrich Corporation, in order to 
discover patterns that would predict a potential failure of a UH-60L helicopter generator. 
Many different ANNs were evaluated for their success rate on this faulting diagnosis. 

The first models that this research shaped were trained, tested, and evaluated 
using only the data collected by IMD-HUMS. The whole database was normalized and 
populated accordingly. One (1) bad and four (4) good generators, which were left out of 
the training and test sets, were used for validation purposes for those models. The method 
was applied sequentially to each bad generator, always maintaining the same group of 
good generators. Two of the six models were considered as good classifiers: with 
accuracy above 78.69% and 89.74%, respectively. The other four models had zero ability 
to classify the bad generators, although they all predicted very well the group of good 
generators. 

The next phase of the experiment was to generate a uniformly distributed artificial 
data set, using it along with the original database to form a training/testing set for further 
classifications. Artificial sets up to 2.6M were created in an effort to capture the bad 
generator patterns. Because the learning phase was time -and resource- consuming, the 
researcher was limited to testing two generators, the ones originally replaced for failure 
(9 and 33). One model started to show some promising results by the time the researcher 
reached the computer’s limits. Obviously, the “curse of dimensionality” comes into play 
at this point of this research. 

During the last portion of this experiment, the researcher tried to deal with the 
multi-dimensional problem and, at the same time, shape a model with good 
generalization behavior. A stepwise strategy was followed and many models were trained 
with various combinations of predictors. Once again, the produced classifiers were not 
generally successful in generator prediction. 

Concluding, the researcher realized that in each phase of this experiment, the bad 
and good generators patterns are very closely related. This affects the learning procedure 
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of the network by blocking its ability to build a model capable of classifying the good and 
bad generators concurrently. Additionally, it is possible that the reason of failure for each 
bad generator is unique, so that the size of this database and the structure of those models 
are not capable of capturing those patterns in a generalized form. On the other hand, this 
research exploits many paths, identifies various issues about the classification of the UH- 
60L helicopter generators, and finally comes up with models capable of classifying a big 
portion of the database in question. 

ANNs are a category of artificial intelligence technology that mimics the human 
brain’s skill at identifying patterns. In theory, ANNs are capable of approximating any 
continuous function. Such flexibility makes for a potentially powerful forecasting tool but 
the big number of parameters that must be selected makes the design process difficult. In 
practice, building an ANN forecasting model involves a lot of trial and error. 
Consequently, the objective of this thesis was to provide a practical, non-technical 
introduction to structure an ANN forecasting model using real operating data of UH-60L 
helicopters. The success of ANN applications for an individual researcher depends on 
three key factors. First, the researcher must have the time, patience, and resources to 
experiment. Second, the ANN software must allow automated routines, such as walk- 
forward testing, optimization of hidden neurons, and testing of input variable 
combinations—either through direct programming or the use of batch/script files. Third, 
the researcher must maintain a good set of records that lists all parameters for each 
network tested. 

This research has verified that ANNs have a position in machinery condition 
monitoring and diagnostics. However, the limited nature of these results indicates that 
ANNs will not solve all machinery condition monitoring and diagnostics problems by 
themselves. They certainly will not completely replace conventional rule-based expert 
systems. Ultimately, it is anticipated that a symbiotic combination of these two 
technologies will provide the optimal solution to the machinery condition monitoring and 
diagnostics problem. 

Further future work can be conducted on building models with more than one 
hidden layer which can explain more complex data, but which requires experience in 
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manipulating the parameters of the utilized software and good a priori knowledge of the 
database itself. Furthermore, enhancing the current database with new real data, 
accompanied with proper maintenance records, will benefit and improve any future effort 
on predicting the generators’ conditions, regardless of the techniques that will be used by 
the researcher. 
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APPENDIX A 

TAIL 

#GENERATOR 

OBSERVATIONS 

FINAL CONDITION 

9126351 

1 

526 



35 

526 
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9226443 


9226446 
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9226453 
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9326500 
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Figure 23. Model 2 (Predict Bad 22 and Good 4, 26, 42, 66) 
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Figure 24. Model 3 (Predict Bad 31 and Good 4, 26, 42, 66) 
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Figure 25. Model 4 (Predict Bad 33 and Good 4, 26, 42, 66) 
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Figure 26. Model 5 (Predict Bad 53 and Good 4, 26, 42, 66) 
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Figure 27. Model 6 (Predict Bad 56 and Good 4, 26, 42, 66) 
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Figure 28. Model 7 (Predict Bad 53, 9 and Good 4, 26, 42, 66) 
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Figure 29. Model 8 (Predict Bad 53, 22 and Good 4, 26, 42, 66) 
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Figure 30. Model 9 (Predict Bad 53, 31 and Good 4, 26, 42, 66) 
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Figure 31. Model 10 (Predict Bad 53, 33 and Good 4, 26, 42, 66) 
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Figure 32. Model 11 (Predict Bad 53, 56 and Good 4, 26, 42, 66) 
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Figure 33. Model 12 (Predict Bad 31,9 and Good 4 ,26, 42, 66) 
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Figure 35. Model 14 (Predict Bad 31, 33 and Good 4, 26, 42, 66) 
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Figure 36. Model 15 (Predict Bad 31, 53 and Good 4, 26, 42, 66) 
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Figure 37. Model 16 (Predict Bad 31, 56 and Good 4, 26, 42, 66) 
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Figure 38. Model 17 (Predict Bad 9 Using Artificial Sets) 
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Figure 39. Model 18 (Predict Bad 33 Using Artificial Sets) 
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Stepwise Model Using 5 Predictors (Predict Bad 9) 
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Figure 41. 


Stepwise Model Using 5 Predictors (Predict Bad 22) 
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Figure 42. Stepwise Model Using 5 Predictors (Predict Bad 31) 
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Stepwise Model Using 5 Predictors (Predict Bad 33) 
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Figure 44. Stepwise Model Using 5 Predictors (Predict Bad 53) 
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Figure 45. 


Stepwise Model Using 5 Predictors (Predict Bad 56) 
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